Machine Learning with Python and Scikit Learn

Follow presentation on: http://bit.ly/ML-Notebook

And all the code is available at: http://bit.ly/ML-SpringCampus

What is Machine Learning?

The process of teaching computers to learn from data.

Learning tasks:

Clustering
Regression
Outlier Detection
Classification
Time series prediction
....

Supervised vs Unsupervised Learning

Let's look at some code:

We will use:



In [1]:

    
import warnings

import numpy as np
import pandas as pd

from time import time

import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

import Utils
from Utils import cmap_light
from Utils import cmap_bold









    



Import done

Boston House Prices



In [2]:

    
boston_dataset = datasets.load_boston()
print(boston_dataset.DESCR)









    



Boston House Prices dataset
===========================

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

    :Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.
http://archive.ics.uci.edu/ml/datasets/Housing


This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980.   N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
problems.   
     
**References**

   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
   - many more! (see http://archive.ics.uci.edu/ml/datasets/Housing)

The data

Dataset:

a set of examples each characterized by features and usually there is a label variable that you want to predict.

Let's see a real example:



In [3]:

    
X = boston_dataset.data
Y = boston_dataset.target

names = list(boston_dataset.feature_names) + ['Price']

labels = np.reshape(Y,
                     (Y.shape[0], 1))
df = pd.DataFrame(data=np.concatenate((X, labels), axis=1),
                 columns=names)
df.head(10)



In [4]:

    
df_tmp = df[['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX',
             'RM', 'Price']]
df_tmp.head(10)



In [5]:

    
df_tmp.describe()



In [6]:

    
from Utils import plot_boston_dataset
plot_boston_dataset(boston_dataset.data, 
                    boston_dataset.target)

First model: Linear Regression

The 95% confidence interval suggests Rexthor's dog could also be a cat, or possibly a teapot.



In [7]:

    
model = LinearRegression()

model.fit(X, Y)

r2 = model.score(X, Y)

print("R^2 value: {:0.3f}".format(r2))









    



R^2 value: 0.741

Congratulations!



In [8]:

    
example_n = np.random.randint(0, Y.shape[0])

Utils.describe_example_boston_dataset(X[example_n])

print("\n\nPredicted price: {:2.2f} Real value: {:2.2f}".format(
        model.predict(X[example_n].reshape(1, -1))[0], Y[example_n]))









    



Feature: CRIM     -    15.29
Feature: ZN       -     0.00
Feature: INDUS    -    18.10
Feature: CHAS     -     0.00
Feature: NOX      -     0.67
Feature: RM       -     6.65
Feature: AGE      -    93.30
Feature: DIS      -     1.34
Feature: RAD      -    24.00
Feature: TAX      -   666.00
Feature: PTRATIO  -    20.20
Feature: B        -   363.02
Feature: LSTAT    -    23.24


Predicted price: 17.73 Real value: 13.90

Distinguishing Species of Iris plants:

Source: Big Cypress National Preserve



In [9]:

    
iris_dataset = datasets.load_iris()

print("Features: " + str(iris_dataset.feature_names))
print("Classes: " + str(iris_dataset.target_names))

X = iris_dataset.data
y = iris_dataset.target









    



Features: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Classes: ['setosa' 'versicolor' 'virginica']



In [10]:

    
# Load it to a DF
idx = np.random.permutation(150)
y = y[idx]
X = X[idx]

labels = np.reshape(y,
                    (y.shape[0], 1))
df = pd.DataFrame(data=np.concatenate((X, labels), axis=1),
                 columns=iris_dataset.feature_names + ['Class'])
df.head(10)









    Out[10]:






  
    
      
      sepal length (cm)
      sepal width (cm)
      petal length (cm)
      petal width (cm)
      Class
    
  
  
    
      0
      6.4
      3.1
      5.5
      1.8
      2.0
    
    
      1
      5.7
      4.4
      1.5
      0.4
      0.0
    
    
      2
      4.8
      3.0
      1.4
      0.1
      0.0
    
    
      3
      5.6
      2.5
      3.9
      1.1
      1.0
    
    
      4
      5.2
      3.4
      1.4
      0.2
      0.0
    
    
      5
      6.1
      3.0
      4.9
      1.8
      2.0
    
    
      6
      6.5
      3.0
      5.2
      2.0
      2.0
    
    
      7
      5.7
      2.6
      3.5
      1.0
      1.0
    
    
      8
      5.7
      3.0
      4.2
      1.2
      1.0
    
    
      9
      7.4
      2.8
      6.1
      1.9
      2.0



In [11]:

    
df.describe()









    Out[11]:






  
    
      
      sepal length (cm)
      sepal width (cm)
      petal length (cm)
      petal width (cm)
      Class
    
  
  
    
      count
      150.000000
      150.000000
      150.000000
      150.000000
      150.000000
    
    
      mean
      5.843333
      3.054000
      3.758667
      1.198667
      1.000000
    
    
      std
      0.828066
      0.433594
      1.764420
      0.763161
      0.819232
    
    
      min
      4.300000
      2.000000
      1.000000
      0.100000
      0.000000
    
    
      25%
      5.100000
      2.800000
      1.600000
      0.300000
      0.000000
    
    
      50%
      5.800000
      3.000000
      4.350000
      1.300000
      1.000000
    
    
      75%
      6.400000
      3.300000
      5.100000
      1.800000
      2.000000
    
    
      max
      7.900000
      4.400000
      6.900000
      2.500000
      2.000000



In [12]:

    
# Let's take a peak at the data:
plt.figure(figsize=(8,8))
colors = "bry"
for i, color in zip([0, 1, 2], colors):
    idx = np.where(y == i)
    plt.scatter(X[idx, 0], X[idx, 1], c=color, cmap=plt.cm.Paired)
    
plt.text(5.25, 2.20, "Versicolor", fontsize=14)
plt.text(7, 3.5, "Virginica", fontsize=14)
plt.text(4.5, 3.75, "Setosa", fontsize=14)

plt.title("The 3 different Iris species", fontsize=18, 
          fontweight='bold')    
plt.xlabel(iris_dataset.feature_names[0], fontsize=14)
plt.ylabel(iris_dataset.feature_names[1], fontsize=14)

plt.show()



In [13]:

    
# We will focus identifying only the Iris Setosa
plt.figure(figsize=(8,8))
colors = "br"

idx = np.where(y == 0) # Give me the indices of the Iris Setosa examples

plt.scatter(X[idx, 0], X[idx, 1], c='b', cmap=plt.cm.Paired)
plt.text(4.5, 3.75, "Setosa", fontsize=14)

idx = np.where(y != 0) # where it's not Iris Setosa 
plt.scatter(X[idx, 0], X[idx, 1], c='r', cmap=plt.cm.Paired)
plt.text(7.0, 2.5, "Others", fontsize=14)


plt.title("Scatter plot of Iris Setosa and the others Iris",
          fontsize=18, fontweight='bold')  
plt.xlabel(iris_dataset.feature_names[0], fontsize=14)
plt.ylabel(iris_dataset.feature_names[1], fontsize=14)
plt.show()

Second model Logistic Regression



In [14]:

    
# We only care about whether each flower is a 
#     Iris Setosa and we are looking only at two of their features

X = iris_dataset.data
y = iris_dataset.target

new_y = y == 0

model = LogisticRegression(random_state=42, verbose=0)

model.fit(X[:,0:2], new_y)

accuracy = model.score(X[:,0:2], new_y)

print("Accuracy: {:0.3f}%".format(accuracy*100))









    



Accuracy: 99.333%



In [15]:

    
from Utils import predict_mesh

# Let's take a look at what our model is doing

# First plot the examples
plt.figure(figsize=(8,8))
colors = "br"

idx = np.where(y == 0)
plt.scatter(X[idx, 0], X[idx, 1], c='b', cmap=plt.cm.Paired)
plt.text(4.5, 3.75, "Setosa", fontsize=14)

idx = np.where(y != 0)
plt.scatter(X[idx, 0], X[idx, 1], c='r', cmap=plt.cm.Paired)
plt.text(7.0, 2.5, "Others", fontsize=14)

(xx, yy, Z) = predict_mesh(X, model)
plt.contour(xx, yy, Z, cmap=plt.cm.Paired)


plt.title("Decision Boundary", fontsize=18, fontweight='bold')   
plt.xlabel(iris_dataset.feature_names[0], fontsize=14)
plt.ylabel(iris_dataset.feature_names[1], fontsize=14)
plt.show()

Linear Regression and Logistic Regression

So how do these models work?

Let's start with linear regression:

$$ \hat{y} = w_0 + w_1.x_1 + w_2.x_2 + w_3.x_3$$

Adding a $x_0=1$ we get

$$ \hat{y} = w^T \cdot x $$

For each variable we have a weight, an "importance", and the linear combination of the weights and features results in our estimated value $\hat{y}$.

What about the weights?

Questions?

Logistic Regression

Same model + Classifier function

Model:

$$ \hat{y} = w^T \cdot x $$

Model + classification:

$$ \hat{y} = g(w^T \cdot x) $$

Sigmoid function

$$g(z) = \frac{1}{1 + e^{-z}}$$



In [16]:

    
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

x = np.arange(-20, 20, 0.001)
y = sigmoid(x)

plt.figure(figsize=(10,5))
plt.plot(x, y)
plt.title("Sigmoid Function", fontsize=14)
plt.show()

Now to a real world example!

Scotch!

Source: Damien Pollet

First look at the data

Data from: Whisky Classified



In [17]:

    
# Read the data file and drop the collumns we don't care about:
whisky_dataframe = pd.read_csv(
    filepath_or_buffer="whiskies.csv", header=0, sep=',',
    index_col=1)
whisky_dataframe.drop(['RowID', 'Postcode', ' Latitude',
                       ' Longitude'], inplace=True, axis=1)
# We get 12 features

whisky_dataframe.head(10)









    Out[17]:






  
    
      
      Body
      Sweetness
      Smoky
      Medicinal
      Tobacco
      Honey
      Spicy
      Winey
      Nutty
      Malty
      Fruity
      Floral
    
    
      Distillery
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Aberfeldy
      2
      2
      2
      0
      0
      2
      1
      2
      2
      2
      2
      2
    
    
      Aberlour
      3
      3
      1
      0
      0
      4
      3
      2
      2
      3
      3
      2
    
    
      AnCnoc
      1
      3
      2
      0
      0
      2
      0
      0
      2
      2
      3
      2
    
    
      Ardbeg
      4
      1
      4
      4
      0
      0
      2
      0
      1
      2
      1
      0
    
    
      Ardmore
      2
      2
      2
      0
      0
      1
      1
      1
      2
      3
      1
      1
    
    
      ArranIsleOf
      2
      3
      1
      1
      0
      1
      1
      1
      0
      1
      1
      2
    
    
      Auchentoshan
      0
      2
      0
      0
      0
      1
      1
      0
      2
      2
      3
      3
    
    
      Auchroisk
      2
      3
      1
      0
      0
      2
      1
      2
      2
      2
      2
      1
    
    
      Aultmore
      2
      2
      1
      0
      0
      1
      0
      0
      2
      2
      2
      2
    
    
      Balblair
      2
      3
      2
      1
      0
      0
      2
      0
      2
      1
      2
      1



In [18]:

    
whisky_dataframe.describe()



In [19]:

    
Utils.plot_whisky_histograms(whisky_dataframe)



In [20]:

    
Utils.plot_whiky_body_correlation(whisky_dataframe)

Let's do a Detour

The Curse of Dimentionality

More features != Better Results

When the dimensionality increases, the volume of the space increases so fast that the available data become sparse.



In [21]:

    
Utils.plot_1d_random_data(0.5, 30)



In [22]:

    
Utils.plot_2d_random_data(0.5, 30)

Feature selection and extraction

PCA



In [23]:

    
n = 50

random_data_1 =np.random.multivariate_normal(
    mean= [0, 0], cov=[[3, 3], [0, 0.5]], size=n)

random_data_2 =np.random.multivariate_normal(
    mean= [6, 6], cov=[[3, 3], [0, 0.5]], size=n)

random_data = np.concatenate([random_data_1, random_data_2], axis=0)
random_labels = np.concatenate([np.ones((n,1)),np.zeros((n,1))], axis=0)

fig = plt.figure(figsize=(8, 8))

plt.scatter(random_data[:, 0], random_data[:, 1],
            c=random_labels, cmap=cmap_light)
#plt.scatter(random_data_2[:, 0], random_data_2[:, 1], c='r')

plt.plot([-5, 10], [-5, 10], 'r--')
plt.plot([5, 0], [0, 5], 'g--')

plt.xlim((-7, 14))
plt.ylim((-7, 14))
plt.title('Random Data with Principal Components', fontsize=16)

plt.xlabel('Random Dimension 1', fontsize=14)
plt.ylabel('Random Dimension 2', fontsize=14)

plt.show()



In [24]:

    
pca = PCA(n_components=2)
pca.fit(random_data)
transformed_data = pca.fit_transform(random_data)

plt.figure(figsize=(8,6))
plt.scatter(transformed_data[:,0], transformed_data[:,1],
            c=random_labels, cmap=cmap_light)
plt.plot([-10, 10], [0, 0], 'r--')
plt.xlim((-10, 10))
plt.ylim((-5, 5))
plt.title('Transformed Random Data', fontsize=16)
plt.xlabel('Random Dimension 1', fontsize=14)
plt.ylabel('Random Dimension 2', fontsize=14)

plt.show()



In [25]:

    
pca = PCA(n_components=1)
pca.fit(random_data)
transformed_data = pca.fit_transform(random_data)

plt.figure(figsize=(8,5))
plt.scatter(transformed_data[:,0], np.zeros((2 * n ,1)),
            c=random_labels, cmap=cmap_light)
plt.plot([-10, 10], [0, 0], 'r--')
plt.xlim((-10, 10))
plt.ylim((-5, 5))
plt.title('Transformed Random Data', fontsize=16)
plt.xlabel('Random Dimension 1', fontsize=14)

plt.show()

print("% of variance explained by PCA: {:0.1f}% \
        ".format(
        pca.explained_variance_ratio_[0]*100))









    












    



% of variance explained by PCA: 98.5%

Model complexity and overfitting



In [26]:

    
### Adapted from: 
# http://scikit-learn.org/stable/auto_examples/
# linear_model/plot_polynomial_interpolation.html 
# Author: Mathieu Blondel
#         Jake Vanderplas
# License: BSD 3 clause

def f(x, noise=False):
    """ Sinusoidal function to be approximated"""
    if(noise):
        np.random.RandomState(42)
        return np.sin(x) + np.random.randn(x.shape[0])/4
    return np.sin(x)

space_size = 2*np.pi

# generate points used to plot
x_plot = np.linspace(-space_size, space_size, 100)

# generate points and keep a subset of them
x = np.linspace(-space_size, space_size, 100)
rng = np.random.RandomState(42)
rng.shuffle(x)
x = np.sort(x[:10])
y = f(x, True)

# create matrix versions of these arrays
X = x[:, np.newaxis]
X_plot = x_plot[:, np.newaxis]

colors = ['teal', 'green', 'brown', 'blue']
lw = 2

fig = plt.figure(figsize=(12,12))
    

for count, degree in enumerate([1, 3, 5, 10]):
    ax = fig.add_subplot(2, 2, count+1)
    ax.plot(x_plot, f(x_plot), color='cornflowerblue', linewidth=lw,
         label="ground truth")
    ax.scatter(x, y, color='navy', s=30, marker='o',
               label="training points")
    model = make_pipeline(PolynomialFeatures(degree), Ridge(random_state=42))
    model.fit(X, y)
    y_plot = model.predict(X_plot)
    ax.plot(x_plot, y_plot, color=colors[count], linewidth=lw,
             label="degree {0} with r^2 = {1:.2f}".format(
            degree, model.score(X,y)))

    ax.legend(loc='lower left', fontsize=14)
    ax.set_ylim((-5, 5))
plt.show()

Back to Scotch!

Let's apply what we learned to our dataset



In [27]:

    
whisky_data = whisky_dataframe.values

pca = PCA(n_components=2, whiten=True) 

# Here whiten means centering the data around 0, 
# which is needed so that PCA works correctly
transformed_data = pca.fit_transform(whisky_data)



In [28]:

    
print("% of variance explained by each component: \
       \n 1st {:0.1f}% \
       \n 2nd {:0.1f}% \
        ".format(
        pca.explained_variance_ratio_[0]*100, 
        pca.explained_variance_ratio_[1]*100))









    



% of variance explained by each component:        
 1st 30.1%        
 2nd 19.2%



In [29]:

    
fig = plt.figure(figsize=(8,6))
plt.scatter(x = transformed_data[:,0], y=transformed_data[:,1])

plt.xlim((-3, 5))
plt.ylim((-3, 5))

plt.title('Transformed Whisky Data', fontsize=16)
plt.xlabel('Principal Component 1', fontsize=14)
plt.ylabel('Principal Component 2', fontsize=14)

plt.show()

Predicting whether it has Tobacco taste



In [30]:

    
labels = whisky_dataframe['Tobacco']
whisky_data = whisky_dataframe.drop('Tobacco', axis=1).values



In [31]:

    
print("Percentage of Positive Labels: {:.2f}%".format(
        np.sum(labels)/len(labels)*100))









    



Percentage of Positive Labels: 11.63%

Unbalanced dataset



In [32]:

    
pca = PCA(n_components=2, whiten=True) 
# Here whiten means centering the data around 0, 
# which is neededso that PCA works correctly
transformed_data = pca.fit_transform(whisky_data)



In [33]:

    
train_data, test_data, train_labels, test_labels = train_test_split(
    transformed_data, labels, test_size=0.30, random_state=0)

# Without Class weights
classf = LogisticRegression()

# With Class weights
class_weight={0:1, 1: 12}
classf = LogisticRegression(class_weight=class_weight)

classf.fit(train_data, train_labels)

accuracy = classf.score(train_data, train_labels)

print("\n\nTraining Accuracy:\t {:0.3f}%\n\n".format(accuracy*100))

accuracy = classf.score(test_data, test_labels)

print("Test Accuracy:\t\t {:0.3f}%\n\n".format(accuracy*100))









    




Training Accuracy:	 76.667%


Test Accuracy:		 73.077%

Confusion Matrix



In [34]:

    
print("\tTraining \n")
predicted_labels = classf.predict(train_data)
cm = confusion_matrix(train_labels, predicted_labels)
Utils.print_cm(cm)

print("\n\tTesting \n")

predicted_labels = classf.predict(test_data)
cm = confusion_matrix(test_labels, predicted_labels)
Utils.print_cm(cm)









    



	Training 

Real Values  |  Predicted Values     
             |  No Tobacco | Tobacco    
No Tobacco   |     42      |    12 
Tobacco      |      2      |     4 

	Testing 

Real Values  |  Predicted Values     
             |  No Tobacco | Tobacco    
No Tobacco   |     17      |     5 
Tobacco      |      2      |     2

Cross Validation



In [35]:

    
class_weight={0:1, 1: 12}

classf = LogisticRegression(random_state=42, 
                            class_weight=class_weight)
#classf = LogisticRegression(random_state=42)



In [36]:

    
# Select parameters to use in Cross-Validation
classf_cv = classf
data_cv = transformed_data
N_CV = 10

# Cross Validation
t0 = time()
scores = cross_val_score(classf_cv, data_cv, labels, cv = N_CV)
print("Scores: ")
for i, score in enumerate(scores):
    print( '\t' + str(i) + ':\t' + str(score)) 
print("Accuracy: %0.3f (+/- %0.3f)" % (scores.mean(), scores.std() * 2))
print("\nCross val done in %0.3fs." % (time() - t0))









    



Scores: 
	0:	0.666666666667
	1:	0.666666666667
	2:	0.555555555556
	3:	0.666666666667
	4:	0.777777777778
	5:	0.555555555556
	6:	0.5
	7:	0.75
	8:	0.75
	9:	0.5
Accuracy: 0.639 (+/- 0.199)

Cross val done in 0.033s.

	CRIM	ZN	INDUS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT	Price
0	0.00632	18.0	2.31	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98	24.0
1	0.02731	0.0	7.07	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14	21.6
2	0.02729	0.0	7.07	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03	34.7
3	0.03237	0.0	2.18	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94	33.4
4	0.06905	0.0	2.18	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33	36.2
5	0.02985	0.0	2.18	0.458	6.430	58.7	6.0622	3.0	222.0	18.7	394.12	5.21	28.7
6	0.08829	12.5	7.87	0.524	6.012	66.6	5.5605	5.0	311.0	15.2	395.60	12.43	22.9
7	0.14455	12.5	7.87	0.524	6.172	96.1	5.9505	5.0	311.0	15.2	396.90	19.15	27.1
8	0.21124	12.5	7.87	0.524	5.631	100.0	6.0821	5.0	311.0	15.2	386.63	29.93	16.5
9	0.17004	12.5	7.87	0.524	6.004	85.9	6.5921	5.0	311.0	15.2	386.71	17.10	18.9

	CRIM	ZN	INDUS	NOX	RM	Price
0	0.00632	18.0	2.31	0.538	6.575	24.0
1	0.02731	0.0	7.07	0.469	6.421	21.6
2	0.02729	0.0	7.07	0.469	7.185	34.7
3	0.03237	0.0	2.18	0.458	6.998	33.4
4	0.06905	0.0	2.18	0.458	7.147	36.2
5	0.02985	0.0	2.18	0.458	6.430	28.7
6	0.08829	12.5	7.87	0.524	6.012	22.9
7	0.14455	12.5	7.87	0.524	6.172	27.1
8	0.21124	12.5	7.87	0.524	5.631	16.5
9	0.17004	12.5	7.87	0.524	6.004	18.9

	CRIM	ZN	INDUS	CHAS	NOX	RM	Price
count	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000
mean	3.593761	11.363636	11.136779	0.069170	0.554695	6.284634	22.532806
std	8.596783	23.322453	6.860353	0.253994	0.115878	0.702617	9.197104
min	0.006320	0.000000	0.460000	0.000000	0.385000	3.561000	5.000000
25%	0.082045	0.000000	5.190000	0.000000	0.449000	5.885500	17.025000
50%	0.256510	0.000000	9.690000	0.000000	0.538000	6.208500	21.200000
75%	3.647423	12.500000	18.100000	0.000000	0.624000	6.623500	25.000000
max	88.976200	100.000000	27.740000	1.000000	0.871000	8.780000	50.000000